Improving Prediction Accuracy of Tumor Classification by Re-using the Discarded Genes during Gene Selection
نویسندگان
چکیده
Background: Since the high dimensionality of gene expression microarray data set hurts generalization performance of classifiers, feature selection has been widely used in the bioinformatics field, which selects relevant features and discards irrelevant and redundant features. While redundant features contain useful information, so multi-task learning is a novel technique to improve prediction accuracy of tumor classification by using these discarded redundant features, but which features should be discarded or used as input, output? The previous works used heuristic methods to search features, the number of features as input, output and discarded are arbitrarily determined. Results: We demonstrate a framework for automatically selecting features as input, output and discarded by using genetic algorithm. GA-MTL (Genetic algorithm based multi-task leaning) and e-GA-MTL (Enhanced version of GA-MTL) are proposed. Experimental results illustrate that our proposed framework is effective to select features for multi-task learning, GA-MTL and e-GA-MTL perform better than other heuristic methods. Conclusions: Genetic algorithm is an powerful technique to select features for multi-task learning automatically, the proposed GA-MTL and e-GA-MTL are two alternative algorithms to improve generalization performance of classifiers for analysis of microarray data sets.
منابع مشابه
SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملPrediction of blood cancer using leukemia gene expression data and sparsity-based gene selection methods
Background: DNA microarray is a useful technology that simultaneously assesses the expression of thousands of genes. It can be utilized for the detection of cancer types and cancer biomarkers. This study aimed to predict blood cancer using leukemia gene expression data and a robust ℓ2,p-norm sparsity-based gene selection method. Materials and Methods: In this descriptive study, the microarray ...
متن کاملClassification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کامل